Skip to content

Bug fixed for DIANN #600

Merged
ypriverol merged 9 commits intomainfrom
dev
Feb 6, 2026
Merged

Bug fixed for DIANN #600
ypriverol merged 9 commits intomainfrom
dev

Conversation

@ypriverol
Copy link
Member

@ypriverol ypriverol commented Feb 6, 2026

Pull Request

Description

Brief description of the changes made in this PR.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Test addition/update
  • Updates to the dependencies has been done.

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Improved data validation to gracefully handle missing data without crashing.
    • Added automatic fallback logic for intensity column selection when preferred data is unavailable.
    • Visualization reports now skip rendering when required data is insufficient, with appropriate warnings logged.
  • Style

    • Updated visualization control configurations for enhanced chart interactivity.

yueqixuan and others added 6 commits January 23, 2026 09:42
Enhance percentage display
This fixes issue #598 where DIA-NN data processing would fail with certain
datasets like PXD063291.

Changes:
1. Fixed sdrf_pipelines import paths for compatibility with newer versions
   (0.1.0+) while maintaining backward compatibility with older versions.
   The module structure changed from sdrf_pipelines.openms to
   sdrf_pipelines.converters.openms.

2. Added robust column existence checks in dia_utils.py:
   - _prepare_quant_table_data: Now checks for required columns and falls
     back to Precursor.Quantity if Precursor.Normalised is missing
   - create_peptides_table: Handles missing Q.Value column gracefully
   - create_protein_table: Uses correct intensity column dynamically
   - draw_diann_quant_table: Skips tables gracefully when data is missing
   - _merge_condition_data: Uses correct intensity column for merging

3. Improved error handling to log warnings instead of raising exceptions
   when optional data is missing, allowing reports to complete with
   available data.

https://claude.ai/code/session_01VrgNLeqdhEHweocVpUmcRT
Keep using the original import paths from sdrf_pipelines (pre-0.1.0 version):
- sdrf_pipelines.openms.openms.UnimodDatabase
- sdrf_pipelines.openms.openms.OpenMS

The column check fixes for DIA-NN processing remain in place.

https://claude.ai/code/session_01VrgNLeqdhEHweocVpUmcRT
Added column existence checks to prevent crashes when DIA-NN reports
are missing expected columns:

- _process_diann_statistics: checks for Protein.Group, Modified.Sequence
- _process_peptide_search_scores: checks for Modified.Sequence, Q.Value
- _process_modifications: checks for Modified.Sequence
- _process_run_data: checks for Run, Modified.Sequence, Modifications, Protein.Group
- draw_dia_ids_rt: checks for Run, RT

Functions now return safe default values and log warnings when columns
are missing, allowing reports to complete with available data.

https://claude.ai/code/session_01VrgNLeqdhEHweocVpUmcRT
Add robust error handling and fallback logic to DIA-NN module
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 6, 2026

Warning

Rate limit exceeded

@ypriverol has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 17 minutes and 56 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

This PR adds input validation and defensive returns to DIA-NN report processing, introduces intensity column handling with fallback logic across data structures, and updates CPSwitch configuration flags in plotting functions to improve data integrity and control flow.

Changes

Cohort / File(s) Summary
DIA Utility Processing
pmultiqc/modules/common/dia_utils.py
Added validation guards to _process_diann_statistics, _process_peptide_search_scores, _process_modifications, and _process_run_data to return early with placeholders when required columns (Protein.Group, Modified.Sequence, Q.Value, Run, etc.) are missing. Introduced intensity column fallback logic that prefers Precursor.Normalised over Precursor.Quantity and propagates the selected column through report_data and cond_report_data attributes for downstream calculations.
Plotting Configuration
pmultiqc/modules/common/plots/dia.py, pmultiqc/modules/common/plots/ms.py
Added "cpswitch_c_active": False flag to draw_dia_whole_exp_charge and enabled CPSwitch control ("cpswitch": True) in three peak_intensity_distribution, distribution_of_precursor_charges, and peaks_per_ms2 plot configurations with corresponding cpswitch_c_active flags.

Sequence Diagram

sequenceDiagram
    participant Report as DIA Report
    participant Parser as parse_diann_report
    participant ModProc as _process_modifications
    participant RunProc as _process_run_data
    participant QuantTable as _prepare_quant_table_data
    participant Condition as _merge_condition_data
    participant PlotFunc as create_peptides_table/protein_table

    Report->>Parser: Input report data
    Parser->>ModProc: Process modifications
    alt Modifications present
        ModProc-->>Parser: Success (sets Modifications)
    else Modifications missing
        ModProc-->>Parser: Returns False (warning logged)
    end
    
    alt Modifications succeeded
        Parser->>RunProc: Process run data
        RunProc-->>Parser: cal_num_table_data (with validation)
    else Modifications failed
        Parser->>Parser: Use placeholder cal_num_table_data
    end
    
    Parser->>QuantTable: Prepare quantification table
    QuantTable->>QuantTable: Check Precursor.Normalised or fallback to Precursor.Quantity
    QuantTable->>QuantTable: Store intensity_col in attrs
    QuantTable-->>Parser: report_data (with intensity_col in attrs)
    
    Parser->>Condition: Merge condition data
    Condition->>Condition: Read intensity_col from attrs (with fallback)
    Condition->>Condition: Store intensity_col in cond_report_data.attrs
    Condition-->>Parser: cond_report_data
    
    Parser->>PlotFunc: Create quantification plots
    PlotFunc->>PlotFunc: Retrieve intensity_col from attrs
    PlotFunc->>PlotFunc: Calculate Average Intensity using dynamic column
    PlotFunc-->>PlotFunc: Generate plots or skip with warning if data insufficient
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • Minor changes #419: Modifies _process_modifications logic in dia_utils.py; this PR's modifications validation changes overlap with UnimodDatabase initialization centralization.
  • Fix bugs and add heatmap in DIA #324: Extensive DIA plotting refactoring including draw_dia_whole_exp_charge and parse_diann_report wiring; directly related to this PR's CPSwitch flag changes and plot guards.
  • Major Bug fixing for MAxQuant and DIANN datasets in PRIDE #265: Introduces robustness and validation changes to DIA-NN data processing and plotting (guarding required columns, conditional plotting) similar to this PR's defensive coding approach.

Suggested labels

Review effort 3/5

Suggested reviewers

  • daichengxin
  • yueqixuan

Poem

🐰 A validation rabbit hops with care,
Guards each column, checks with flair,
Intensity flows through attrs so neat,
Fallbacks ready when data's not complete,
CPSwitch flags dance in the light! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 56.25% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The pull request title 'Dev' is vague and generic, providing no meaningful information about the changeset. Replace with a descriptive title that reflects the main change, such as 'Add robust DIA-NN processing with input validation and safe defaults' or similar.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ypriverol ypriverol changed the title Dev Bug fixed for DIANN Feb 6, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@pmultiqc/modules/common/dia_utils.py`:
- Around line 160-165: The callers pass pep_plot from _process_diann_statistics
into draw_num_pep_per_protein and will crash if pep_plot is None; add a guard at
the start of draw_num_pep_per_protein to check for pep_plot is None, log or warn
about missing peptide plot data, and return a safe value (e.g., None or empty
figure) instead of accessing pep_plot.dict["data"]. Update
draw_num_pep_per_protein (and any closely related plotting helpers) to
early-return when pep_plot is None so quantms.py and diann.py no longer need to
change their call sites.

Comment on lines 160 to +165
def _process_diann_statistics(report_data):
"""Process DIA-NN statistics and create peptide plot."""
required_cols = ["Protein.Group", "Modified.Sequence"]
if not all(col in report_data.columns for col in required_cols):
log.warning(f"Missing required columns for statistics: {[c for c in required_cols if c not in report_data.columns]}")
return 0, 0, None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for parse_diann_report usage and context
rg -n --type=py -C5 'parse_diann_report' -g '!*test*'

Repository: bigbio/pmultiqc

Length of output: 3955


🏁 Script executed:

#!/bin/bash
# Get the full parse_diann_report function implementation
sed -n '32,150p' pmultiqc/modules/common/dia_utils.py

Repository: bigbio/pmultiqc

Length of output: 4386


🏁 Script executed:

#!/bin/bash
# Get the _process_diann_statistics function
sed -n '160,200p' pmultiqc/modules/common/dia_utils.py

Repository: bigbio/pmultiqc

Length of output: 1759


🏁 Script executed:

#!/bin/bash
# Check how pep_plot is used in quantms.py after assignment
rg -n --type=py 'self.pep_plot' pmultiqc/modules/quantms/quantms.py -A3 -B1

Repository: bigbio/pmultiqc

Length of output: 1837


🏁 Script executed:

#!/bin/bash
# Check how pep_plot is used in diann.py after assignment
rg -n --type=py 'self.pep_plot' pmultiqc/modules/diann/diann.py -A3 -B1

Repository: bigbio/pmultiqc

Length of output: 420


🏁 Script executed:

#!/bin/bash
# Find the function that receives pep_plot
rg -n --type=py 'def.*draw_num_pep_per_protein' -A5

Repository: bigbio/pmultiqc

Length of output: 476


🏁 Script executed:

#!/bin/bash
# Check if there's any None-checking for pep_plot before calling draw_num_pep_per_protein
rg -n --type=py 'draw_num_pep_per_protein' -B5 -A1 pmultiqc/modules/diann/diann.py

Repository: bigbio/pmultiqc

Length of output: 623


🏁 Script executed:

#!/bin/bash
# Check the full draw_num_pep_per_protein function to see how pep_plot is used
sed -n '1001,1030p' pmultiqc/modules/common/plots/id.py

Repository: bigbio/pmultiqc

Length of output: 901


🏁 Script executed:

#!/bin/bash
# Check the same in quantms.py for draw_num_pep_per_protein usage
rg -n --type=py 'draw_num_pep_per_protein' -B10 -A5 pmultiqc/modules/quantms/quantms.py

Repository: bigbio/pmultiqc

Length of output: 1802


🏁 Script executed:

#!/bin/bash
# Check if there's any None-check around the draw_num_pep_per_protein calls in quantms.py
sed -n '350,365p' pmultiqc/modules/quantms/quantms.py

Repository: bigbio/pmultiqc

Length of output: 648


🏁 Script executed:

#!/bin/bash
# Check the broader context around draw_num_pep_per_protein in quantms.py
sed -n '340,365p' pmultiqc/modules/quantms/quantms.py

Repository: bigbio/pmultiqc

Length of output: 1039


🏁 Script executed:

#!/bin/bash
# Check broader context in quantms.py around draw_num_pep_per_protein calls
sed -n '320,365p' pmultiqc/modules/quantms/quantms.py

Repository: bigbio/pmultiqc

Length of output: 1888


🏁 Script executed:

#!/bin/bash
# Double-check if there's any conditional logic around draw_num_pep_per_protein calls
sed -n '355,365p' pmultiqc/modules/quantms/quantms.py && echo "---" && sed -n '185,195p' pmultiqc/modules/diann/diann.py

Repository: bigbio/pmultiqc

Length of output: 688


Callers do not handle pep_plot=None gracefully—this will crash at runtime.

When required columns are missing, _process_diann_statistics returns None for pep_plot. Both quantms.py (line 358) and diann.py (line 187) pass this directly to draw_num_pep_per_protein, which immediately accesses pep_plot.dict["data"] (id.py:1006) without a None check. This causes an AttributeError if column validation fails.

Add a guard in draw_num_pep_per_protein or skip the call when pep_plot is None.

🤖 Prompt for AI Agents
In `@pmultiqc/modules/common/dia_utils.py` around lines 160 - 165, The callers
pass pep_plot from _process_diann_statistics into draw_num_pep_per_protein and
will crash if pep_plot is None; add a guard at the start of
draw_num_pep_per_protein to check for pep_plot is None, log or warn about
missing peptide plot data, and return a safe value (e.g., None or empty figure)
instead of accessing pep_plot.dict["data"]. Update draw_num_pep_per_protein (and
any closely related plotting helpers) to early-return when pep_plot is None so
quantms.py and diann.py no longer need to change their call sites.

@ypriverol ypriverol merged commit 1a5bc9d into main Feb 6, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants